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ABSTRACT 

Dependencies have played a significant role in database design for 
many years. They have also been shown to be useful in query 
optimization. In this paper, we discuss dependencies between 
lexicographically ordered sets of tuples. We introduce formally 
the concept of order dependency and present a set of axioms 
(inference rules) for them. We show how query rewrites based on 
these axioms can be used for query optimization. We present 
several interesting theorems that can be derived using the 
inference rules. We prove that functional dependencies are 
subsumed by order dependencies and that our set of axioms for 
order dependencies is sound and complete. 

1. INTRODUCTION 

Consider the following SQL query (in Example 1). 
Example 1 . 

select D.year, D. quarter, D. month, 

sum(S. sales) as total 
from Dates D, Sales S 
where D.date id = S.date id 

and D.year between 2001 and 2004 
group by D.year, D. quarter, D. month 
order by D.year, D. quarter, D. month 

In the schema, Dates is a dimension table with a row per day, 
and Sales is a very large fact table recording all individual sales. 
Each has a surrogate-valued column date id, which is the 
primary key for Dates. In the Dates dimension table, each row 
describes a given day with explicit columns as year, quarter, 
month, and day that describe the natural date values (and 
additional columns that qualify that day, such as whether it is a 
weekend day or holiday). 

Assume we have a tree index for Dates on year, month, day. 
This index cannot help in a query plan, however, to accomplish 
the group-by because quarter intercedes. Of course, quarter 
is logically redundant here, as month (which follows it in the 
group-by) functionally determines quarter. (First quarter 
encompasses the months of January, February, and March, second 
quarter, the months of April, May, and June, and so forth.) The 
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query's author could not leave quarter out of the group-by - 
even if he realizes it would be better to - because it is stated in the 
select. The query optimizer could, however, use an index scan 
to have the tuple stream in year, month order to accomplish the 
group by on year, quarter, month, if it recognizes that 
year, month and year, quarter, month offer the same 
partition. This is done by query optimizers today - given the 
functional dependency (FD) information that month — > 
quarter is available to the optimizer - by rewrite. 

For the query above, the rewrite might still not be applied, since 
the query specifies the answers to be ordered by year, 
quarter, month. The FD that month — > quarter is not 
logically sufficient to eliminate quarter from the order-by, as it 
was to eliminate it from the group-by. Since a query plan must 
guarantee the order-by, it likely will include a sort operator for 
year, quarter, month, after all. 

To see that the FD does not suffice to eliminate quarter from 
the order-by, imagine the values for quarter were the strings 
first, second, third, and fourth. Data would be lexicographically 
ordered as first, fourth, second, then thirdl Of course, we intend 
that values of quarter are, say, 1, 2, 3, and 4, so the data would 
order naturally as by date. It is unfortunate, then, that quarter 
is, in fact, redundant (in this query) in the order-by also, but that 
the optimizer does not have the means to eliminate it. 

What is missing is the semantic information that month orders 
quarter, which is more than just that month functionally 
determines quarter. This states that as values rise from one 
tuple to another on month, they must rise, or stay the same, from 
the one tuple to the other on quarter (that is, the values do not 
descend from the one tuple to the other on quarter). These 
have been called order dependencies (ODs), in contrast to 
functional dependencies. Our objective is to bring reasoning about 
order dependencies into the query optimizer. A query plan for the 
query above could then eliminate quarter from both the order- 
by and the group-by clauses, and the index on year, month, 
day might then provide for an efficient plan with no need for a 
sort operator. 

The notion of order dependencies can be greatly generalized, and 
the potential use of them in query optimization shown to be vast. 
The relationships between ordered sets have been explored in the 
past and several different notions of order have been considered. 
In this work, we consider just lexicographical ordering of tuples, 
as by the order-by operator in SQL, because this is the notion of 
order used in SQL and within query optimization for tuple 
streams. 
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The contribution of this paper is to present an axiomatization for 
order dependencies, analogous to Armstrong's axiomatization for 
FDs [1]. This provides a formal framework for reasoning about 
ODs. There are two reasons for one to pursue an axiomatization. 

1. The axioms provide insight into how dependencies 
behave - and patterns for how dependencies logically 
follow from others - that are not easily evident 
reasoning from first principles. 

2. A sound and complete axiomatization is the first 
necessary step to designing an efficient inference 
procedure. 

Our axioms for ODs help us explore beneficial query rewrites. 
We show how they can be cast as a new type of integrity 
constraint to be used in query optimization. We derive theorems 
based on our axioms, which illustrate surprising inferences and 
equivalences over ODs, and which can provide for powerful query 
rewrites. While ODs for databases have been explored before, we 
present the first general axiomatization for them. We prove the 
soundness of the axioms. We demonstrate that Armstrong's 
axiomatization for FDs is subsumed within our axiomatization for 
ODs. (In this sense, ODs can be thought of as a generalization of 
FDs.) We then prove the completeness of the set of axioms. 
Working with ODs is more involved than with FDs because the 
order of the attributes matters. Thus, we must work with lists of 
attributes instead of with sets. This necessarily complicates our 
axioms - compared with Armstrong's axioms for FDs - and the 
proofs of our theorems. 

Outline. In Section 2, we present (ODs) formally. We provide 
background, our notational conventions, and definitions for ODs 
(Section 2.1). We show from where ODs in databases naturally 
arise (Section 2.2). We demonstrate a number of effective ways 
ODs may be used in query optimization (Section 2.3). We discuss 
a query optimization technique with ODs that we have 
implemented as a prototype in IBM DB2 [18], and our ongoing 
work with these techniques. In Section 3, we introduce the 
axiomatization for ODs (Section 3.1), and we prove the soundness 
of the axioms (Section 3.2). We derive a collection of theorems 
using our axioms - which we use in the proof of completeness - 
which illustrate the utility of our axioms (Section 3.3). In Section 
4, we prove the completeness of the axiomatization. We sketch 
our proof of completeness (Section 4.1). We demonstrate how 
FDs are subsumed within order dependencies (Section 4.2). With 
the requisite pieces in place, we present the formal proof of 
completeness of the axiomatization (Section 4.3). In Section 5, we 
discuss related work. In Section 6, we present plans for future 
work and make concluding remarks. This work, we feel, opens 
exciting venues for future work to develop a powerful new family 
of query optimization techniques in database systems. 

2. ORDER DEPENDENCY 

We first set out formal definitions for order dependencies that we 
need later in proofs. Next, we illustrate ODs in databases and how 
they arise. We then show the use-case scenarios for ODs for query 
optimization. 

2.1 Formal Definitions 

We adopt the notational conventions in Table 1. We consider a 
relation R with a schema set of attributes t/. Let r be an arbitrary 
table instance over R; thus a set of tuples under R's schema with 
attributes 11. We limit table instances to sets in our definitions, to 
keep our definitions simpler and easier to follow. However, this 



could be changed to multi-sets easily, with no consequences to our 
axiomatization. 

Table 1. Notational conventions. 

Relations 

• A capital letter in bold italics represents a relation: R. 

• A small letter in bold represent a relational instance 
(a table): v. 

• We use capital letters to represent single attributes: 
A, B, C. Tuples are marked with small letters in italics: s, t. 

Sets 

• Calligraphic letters stand for sets of attribute: X, y, Z. 

• We use proximity for union of sets: Xy is shorthand 
forXU'y. Likewise, AX orXA, where X is a set of 
attributes and A a single attribute, stands forX U {A}. 

• Also t x denotes the projection of the tuple t on the 
attributes of X, while t A is the shorthand for tj- A ). 

Lists 

• Bold letters stand for lists of attributes: X, Y, Z. Note list 
X could be the empty list, []. 

• We use square brackets to denote a list: [A, B, C]. The 
notation [A [ T] denotes that A is the head of the list, and 
T is the tail of the list, the remaining list with the first 
element removed. 

• Proximity is used for concatenation of lists of attributes: 
XY is shorthand forX°Y. Likewise, AX and XA stands 
respectively for [A] °X andX° [A], where X is list of 
attributes and A is a single attribute. AB denotes [A, B]. 

• X' denotes some other permutation of elements of list X. 

Definition 1 . (operator 4) Let X be a list of attributes, s and t be 
two tuples in relation instance r. Operator < is defined as follows: 

s x < t x where X = [A | T] 
if (s A < t A ) 

or if ((s A = t A ) and (T = [] or s T < t T )) 

In this paper, we assume ascending (asc) order in the 
lexicographical ordering. (This is SQL's default.) We do not 
consider descending (desc) orders, mixing of asc and desc 
(e.g., order by X desc, Y asc) [19], or use of functions in 
the order directives (e.g., order by -1*X asc, Y asc). 

Definition 2. (operator <) Let X be a list of attributes, s and t be 
two tuples in relation instance r. The operator < is defined as 
follows: s x < t x iffs x 4 t x and t x 4 s x . 

Definition 3. (s x = t x ) Let X be a list of attributes, s and t be 
two tuples in relation instance r, s x = t x iff s x < t x and t x s x . 

Definition 4. (order dependency) Let X and Y be list of 
attributes. Call X >-> Y an order dependency (OD) over the 
relation R if, for every pair of admissible tuples s and t in relation 
instance r over R, s x 4 t x implies s Y < t Y . 

Whenever X >-> Y, we say that X orders Y. X and Y are order 
equivalent iff X >-> Y and YhX. We denote this by X <-» Y. 
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Figure 1. Relation instance r. 
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Example 2. Note that [A, B, C] >-> [F, E, D] is consistent with r, 
but [A, B, C] !-> [F, D, E] is falsified by r in Figure 1 . 

The OD XhY means that Y's values are monotonically non- 
decreasing with respect to X's values. Thus, if a list of tuples are 
ordered by Y, then they are also necessarily ordered by X, but not 
necessarily vice versa. That is to say, if one knows X h> Y, then 
one knows that any ordering of the tuples of r, for any r, that 
satisfies order by Y also satisfies order by X. 

There is a clear relationship between ODs and FDs. Any OD 
implies and FD (modulo lists and sets), but not vice versa. 

Lemma 1. (relationship between ODs and FDs). For every 
instance t of relation R, if OD XhY holds, then FD X — > Tj is 
true. 

Proof. Let s, t £ r, such that s x = t x . Therefore, s x ^ t x and 
t x s> s x . By the definition of OD s Y t Y and t Y < s Y , hence as 
s x - tx> s y - ty. □ 

Definition 5. (order compatible) Two lists X and Y are order 
compatible, denoted as X ~ Y iff XY <-> YX. 

Example 3. Note that [A, B] ~ [F, C] is consistent with r, but 
[A, C] ~ [F, D] is falsified by r in Figure 1 . 

2.2 Order By 

The concept of functional dependencies has come to have 
profound importance in databases, especially in schema design. 
While functional dependencies are a simple notion in some ways, 
reasoning over them is, somewhat surprisingly, not nearly as 
simple. To gain insight into how sets of FDs behave, and to 
simplify the reasoning process over them, Armstrong provided an 
axiomatization for them [1]. Beyond layout and indexes, FDs play 
additional important roles in query optimization. Knowledge 
about prescribed FDs on the schema are used in the query-rewrite 
phase of optimization potentially to eliminate predicates. They are 
used in the cost-based phase to do better cardinality estimation. 
They are used also to recognize partitioning equivalences of tuple 
streams within query plans. 

We have introduced ODs in analogy to FDs: functional 
dependencies are to group-by as order dependencies are to order- 
by. On the one hand, order is not important in the pure relational 
model on the logical side of the fence. Relational instances are 
sets of tuples. (Implemented systems allow for multi-sets of 
tuples, but again, there is no notion of order.) A schema is a set of 
attributes. SQL concedes a single order-by clause to be appended 
to a query to order the result set, as a convenience, given that 
people often want to see the results sorted in a given way. (This 
said, there are many places where order is semantically 
meaningful. Data stream extensions to the relational model make 
order a part of the model. For other data models such as XML - 
and XQuery over it - order is an integral part of the model.) 

On the other hand, order plays pivotal roles on the physical side, 
in the physical database and in query optimization. Data is often 
stored sorted by a clustered (tree) index's key. In a query plan, an 
operator that takes as input the output stream of another operator 
can benefit in cases when the stream is sorted in a particular way. 
Aggregation queries (group-by) can be evaluated on-the-fly if the 
stream is ordered already in a way compatible with the requested 
group-by partition, rather than needing to do a partitioning 
operation that could involve heavy I/O expense. 



Given X i-» Y, if one has an SQL query with order by Y, one 
can rewrite the query with order byX instead, and meet the 
intent of the original query. However, the rewritten query is not 
semantically equivalent the original (unless X <-» Y)! One could 
not legally rewrite the query with order by X with order by Y 
instead. Strengthening the order-by conditions is permitted, but 
weakening them is not. (This is true, too inside query plans for 
ordered tuple streams.) 

One does not need order equivalences then to accomplish useful 
query rewrites. Directional order dependencies (e.g., XhY, but 
not YhX) suffice. This makes ODs that much more versatile for 
rewrites. Notice this differs from the use of FDs for query 
rewrites, for instance, to simplify group-by' s. To replace year, 
quarter, month by year, month in the group-by for the 
query in the example in Section 1, one should know the two are 
functionally equivalent. One could not replace it by year, 
month, day, for example, even though {year, month, day} 
— ► {year, quarter, month } . 

Within query plans, group-by (partitions) can be accomplished 
either by a partition operation (such as by use of a hash index), or 
by the use of an ordered tuple stream (as provided by a tree-index 
scan or by a sort operation). When rewriting the partition criteria, 
if a partition operation is employed, the criteria must be 
equivalent. However, when an ordering operation is employed 
instead, then one has the same flexibility as noted for OD 
dependencies. Strengthening the criteria suffices. For instance, 
sorting by year, month, day would suffice to accomplish the 
group-by on year, quarter, month. (Group divisions can be 
found on-the-fly in the stream.) 

An OD can be declared as an integrity constraint to prescribe 
which instances are admissible. (We have introduced this new 
type of constraint in a prototype branch of IBM DB2. See Section 
2.3.) One can reason over ODs on relations in a similar way one 
now reasons about FDs over relations. Some order dependencies 
are trivially true [20]. That is, they are (trivially) satisfied by any 
table instance. For example, consider XY i-> X. Others are not 
trivial. If one knows a collection of order dependencies, M - 
declared as integrity constraints over relation R - one might 
soundly infer additionally order dependencies that must be true 
for R. For example, if X i-» Y and Y i-> Z are true, then X i-> Z is 
true also. (That is, ODs are transitive.) 

While order is not part of the relational model, per se, ordered 
value domains are of key importance for most databases, and most 
queries. Many types of ODs are apparent in the semantics of 
databases (even though these ODs are not declared explicitly). 
Perhaps the most important of these ordered domains in practice is 
time. Time and date (time at a coarser granularity) are richly 
supported in the SQL standards. The common benchmark TPC- 
DS has 99 queries. Of these, 85 involve date operators and 
predicates (and five involve time operators and predicates). This is 
common for data-warehouses. Even if we were just limited to 
ODs over the date/time domain, we could derive great benefits in 
query optimization. 

Figure 2 represents possible ODs, in which the left-hand side of a 
dependency is time and the right-hand side is one of the paths 
through the diagram. Each node is an equivalent class of the list of 
attributes leading up to it, with respect to the starting point. 
Theorem 10 proves that any list appearing on the left side can be 
suffixed by attributes appearing along an equivalent path. This is 
shown in Example 4. 
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Example 4. As [time] >-> [date, hour] holds and 
[date] i-» [year, month, day], it follows (from Theorem 10 
below) that [time] i-> [date, month, hour]. 

Time 

i *o 1 



Quarter 




Date 



Figure 2. Time diagram. 

Order dependencies are not just limited to the time domain, 
however. They arise naturally in many other domains from the 
real-world semantics associated with given data. All that is 
required is that the values of a column (or list of columns) are 
monotonically non-decreasing with respect to the values of 
another column (or list of columns). This property is fairly 
common when columns are functionally related. 

Example 5. Consider a table Taxes that includes columns for 
taxable income, tax bracket, and taxes on the income. The 
tax brackets are based on the level on income (and so rise with 
income level). Assume taxes go up with income. Then, 
[income] >-> [bracket] and [income] ^ [taxes]. It follows 
(from Theorem 2 below) that [income] >-> [bracket, taxes]. 

Assume the table has a tree index on income. Given a query on 
the table with an order-by on bracket, taxes, with the OD 
above, it could be evaluated using the index on income. 

Instead of being columns with explicit data, bracket and 
taxes could be derived by functions or case expressions - say, if 
Taxes were a view - or generated columns in the table. In these 
cases, it would be possible for the database system to derive the 
order-dependency constraints above automatically. In [12], it was 
shown how to derive such monotonicity "constraints" from 
generated columns via algebraic expressions (in IBM DB2). Of 
course, one could prescribe the set of order dependencies as check 
constraints directly to benefit by this technique. 

Such monotonic dependencies can be derived from built-in SQL 
functions, from user-defined functions (to some degree), and from 
case expressions. The SQL function Year, for example, extracts 
the year component of a datestamp. Thus, given a datestamp 
column when, [when] h> [Year (when) ]. 

2.3 Optimization 

In [17], the authors expounded on the important role of order in 
query optimization. They demonstrated numerous examples of 
how better reasoning over interesting orders in the query 
optimizer could lead to significantly better performing query 
plans. They introduced query rewrites in IBM DB2 that could 
replace one labeled interesting order by another, when it is known 
the two order in the same way (that is, are order equivalent, as we 
have defined it). 



They showed how these rewrites could allow the optimizer to 
consider additional query plans that process join, order-by, group- 
by, and distinct operators more efficiently. By recognizing that a 
tuple stream ordered with respect to some criteria is equivalently 
ordered with respect to other criteria, a sort on input can be 
removed for a sort-merge join. Order-by and group-by operators 
can be satisfied with no need for a sorting or partitioning 
operation more often, as with our Example 1. Likewise, as the 
distinct operator is exchangeable with group-by, the need for a 
sorting or partitioning operation to satisfy distinct can be lessened. 

Our work builds upon this work. Their rewrites rely on functional 
dependency information available to the optimizer, but do not 
exploit any order dependency semantics, as defined by us. Our 
work permits a greater range of rewrites. For example, they could 
reduce an order-by year, month, quarter to an order-by 
year, month, based upon the FD {month} — > {quarter}. 
(Likewise, they could reduce the equivalent group-by.) However, 
they could not reduce the order-by year, quarter, month to 
year, month, as we did in Example 1, since their techniques do 
not employ the idea of ODs. (It is Theorem 8 below, called Left 
Eliminate, which follows from our axiomatization, which justifies 
this rewrite.) 

In [17], they introduced a rewrite algorithm for order-by called 
Reduce Order. It sweeps the order-by attribute list from right to 
left, seeking to eliminate attributes. Each iteration through the list, 
the prefix set with respect to the current attribute - that is, the set 
of attributes to the left of the current - is checked to see whether it 
functionally determines the current attribute. If so, the attribute is 
dropped from the list. 

We can augment that algorithm - call it Reduce Order* - to do an 
additional step. Each iteration through the list, it can additionally 
be checked whether any postfix list with respect to the current 
attribute - that is, a list of attributes to the right of the current - 
orders the current attribute. If so, the attribute is dropped from the 
list. Given the OD [month] i-> [quarter], both order-by year, 
month, quarter and year, quarter, month would be 
reduced to year, month. 

Order dependencies are in terms of lists of attributes, not sets as 
for functional dependencies. This makes matching in rewrites 
using ODs more complex generally, but also increases the 
possibilities for matches. Consider D h B. Then ABD could be 
reduced to AD. However, ABCD cannot be! The attribute C 
intervening between the B and D invalidates the rewrite. For 
the rewrite by Theorem 8 to apply, the list on the right-hand side 
of the OD must precede directly the list on the left-hand side. If 
we knew D i-» BC, then ABCD could be reduced to AD. 

A major part of our continued work with order dependencies is to 
develop a number of efficient rewrite rules for the query 
optimizer, as they did in [17], to exploit ODs effectively. Our OD 
axiomatization provides us the means now to pursue this. The 
axioms and related theorems as in Section 3.3 provide us with 
insight into the types of rewrites that are possible. In [18], we 
developed query rewrites in a prototype branch of the IBM DB2 
9.7 codebase that demonstrates the effectiveness of rewrites using 
order equivalences. In data-warehouses, date is often represented 
explicitly as a dimension table of its own, with the primary key of 
the date table made as a surrogate key [11]. While this design can 
have compelling advantages, the surrogate key can cause 
problems for efficiently evaluating queries. 
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A majority of queries in a data warehouse are over the fact table. 
A query often uses natural date values in predicates. However, 
date in the fact table is recorded by the surrogate key. This 
necessitates potentially a quite expensive join between the fact 
table and the date dimension table when the query is evaluated. 
There is an additional problem when a fact table has been 
partitioned by date in order to accommodate a very large table 
(e.g., in distributed systems). Since the date range (surrogate 
values) over the fact table cannot be determined from the query 
(natural values), all partitions of the fact table must be scanned. 
We optimize such queries involving dates by removing this join, 
and choosing just the relevant partitions of the fact table when the 
table is distributed. 

A number of queries in the TPC-DS benchmark have this 
condition. Fortunately, we have a guarantee (an OD) that the 
surrogate (date) keys in the date dimension table are ordered in 
the same way as natural date values in the dimension table. Thus, 
the query plan can make two probes into the dimension table to 
calculate the range of the surrogate keys from the fact table. These 
two probes into the date table find mindate and maxdate 
surrogate key values. These two surrogate key values replace the 
range predicate, which allows the index on the date column in the 
fact table to be used. 

The details of when and how this rewrite can be performed in a 
general way are provided in [18]. We built a prototype 
implementing such rewrites in IBM DB2 V.9.7 and performed 
experiments over TPC-DS to demonstrate the efficiency of the 
approach. Thirteen of TPC-DS's queries matched the conditions 
for this rewrite. Every one of these thirteen benefited, with an 
average performance gain of 48%. Since this work reported in 
[18], we have continued work on the prototype. We have added a 
new type of check constraint which expresses an OD. We have 
implemented more OD rewrite rules which now rewrite eighteen 
of TPC-DS 's queries with performance gain. Consider our 
technique from [18] combined with an OD rewrite of the order-by 
for our query in Example 1. If we have the OD that 
[date_id] >-> [year, month], the order-by and group-by 
operators in a query plan could be accomplished by an index scan 
over the index for Sales, the fact table, on date id, then 
joining the results against the dimension table Dates. 

3. AXIOMATIZATION 

A key concern in dependency theory is developing the algorithms 
for testing logical implication. Developing inference rules is an 
approach to show logical implication between dependencies. 

3.1 Axioms 

Definition 6. (A proof of OD 6 from>f) Let M be a set of 
prescribed ODs. A proof of OD 6 from M with the set of 
inference rules J is a sequence 8 — 6 1 , ... ,6 n (n> 1) such that 
forfe £ [l,n] either 6 k £ M, or there exists a substitution for 
some rule 6 £ J , such that 6 k is consequence of (p, and such that 
for each order dependency in the predecessor of 8 the 
corresponding order dependency is in the set {9 t | 1 < i < k}. 

The OD 6 is provable from M using axioms (relative to set of 
attributes U), denoted M \- 6, if there is a proof of 9 fromJtf 
using J. We now introduce axioms (inference rules) for ODs. 

Definition 7. (OD axioms) The inference rules for ODs are as 
follows. 



OD1: Reflexivity 
XY ^ X 

OD2: Prefix 



OD5: Suffix 

X >-> Y 

X ^ YX 

OD6: Chain 



ZX ^ ZY 



OD3: Normalization 

WXYXV ^ WXYV 

OD4: Transitivity 
X i-» Y 

VhZ 



X~ Y a 

V iE[l,n-l]Yj ~ 



V i£[l,n] Y i X ~ Y i Z 

X ~ z 



X i-> z 



Two of our axioms generate trivial dependencies [20]: Reflexivity 
and Normalization We define the closure of the set of OD M, 
denoted M + , to be the set of ODs that are logically implied 
byM. 

Definition 8. (closure of M using J). Let J = {OD1-OD6}, 
then M + = {X >-> Y | M HjXh Y}. 

Definition 9. (equivalents sets of OD). Let M and M' be sets 
of ODs. We say that M and M' are equivalent iff 
{X >-> Y | M 1= X >-> Y} = {X i-> Y | .M" 1= X ^ Y}. 

3.2 Soundness 

In this subsection, we address the problem of showing that our 
OD axioms are sound. This is to say, they lead only to true 
conclusions. 

Definition 10 (soundness of OD axioms) Let 3 be a set of 
inference rules {OD1-OD6}. Then is sound for logical 
implication of ODs if X >-> Y is deduced from M (M h 5 X h Y ) 
using axioms 1, then X i-> Y is true in any relation in which the 
dependencies of M are true M 1= X >-> Y. 

Let r be a relation over R. The following Lemmas are true. 

Lemma 2. (soundness of Reflexivity) Reflexivity is sound. 

Proof. Let s, t £ r, such that s XY ^ txy- From the 
recursiveness of Definition 1 of operator < it follows that (1) 
s x - tx and s Y < t Y or (2) s x < t x . (1) and (2) imply that s x < t x , 
therefore Vr.XY >-> X. □ 

Lemma 3. (soundness of Prefix) Prefix is sound. 

Proof. Let s, t £ r, such that s zx t zx . This implies (1) 
Sz < tz or (2) s z = t z and s x =s> t x . For (1) s ZY < t ZY holds as 
s z < t z - In the second scenario (2), s x < t x implies s Y t Y 
(X h> Y is given). Hence, as s z = t z it is true that s ZY < tz\. 
Vr.X >-> Y implies ZX >-> Z. □ 

Lemma 4. (soundness of Normalization) Normalization is 
sound. 

Proof. (IF) Let s, t £ r, such that Swxyv %xyv- This implies 
that: (1) s WXY = Cwxy and s v t v or (2) s WXY < t WXY . In (1) 
s x = f x as s wxy — twxY- Therefore we can suffix WXY by list X 
and Swxyx — %xyx holds. Hence, Swxyxv ^ %xyxv as we know 
that s v ^ t v . Scenario (2), as Swxy < %xy implies that we can 
suffix list WXY by XV and Swxyxv ^ %xyxv holds. 

(ONLY IF) Let s, t £ r, such that Swxyxv ^ %xyxv- This implies 
that: (1) s WXY = t WXY and Sxv « t xv or (2) s WXY < t WXY . In (1) 
s x - tx as s wxy - %xy- Hence, s v < t v as we know that 
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Sxv ^ f xv- Therefore, Swxyv ^ %xyv- Scenario (2), as s WXY < 
t WX Y implies that we can suffix list WXY by V and Swxyv ^ 
twxw holds. Vr . WXYXV <-> WXYV . □ 

Lemma 5. (soundness of Transitivity) Transitivity is sound. 

Proof. Let s, t £ r, such that s x < t x . By X h» Y which is given 
s Y t Y which implies s z < t z and it ends the proof. Vr.X h YA 
Y h> Z implies XhZ. □ 

Lemma 6. (Soundness of Suffix) Suffix is sound. 

Proof. (IF) Let s, t £ r, such that s x < tx- Therefore s Y < t Y as 
X i-> Y is given, which implies that s YX 4 tyx (X <-* YX). 

(ONLY IF) Let s, t £ r, such that Syx 4 tyx- Therefore (1) 
s Y = t Y and s x t x or (2) s Y < t Y is true. Scenario (1) directly 
implies that s x t x (YX i-> X). Scenario (2) where s Y ■< t Y 
implies that s x < t x . This is because s x -< t x implies t x < s x 
which implies t Y < %• Hence s Y -£ t Y . This ends the proof as 
s x « t x (YX i-> X). Vr.X <-> YX □ 

Lemma 7. (Soundness of Chain) Chain is sound. 

Proof. Without loss of generality, assume that the lists in the 
axiom are single attributes. Let X = A, Y x = B 1: ...,Y n = B n and 
Z = C This simplification makes it easier to extend the rule to 
lists. The proof is by contradiction. Assume that A and C are order 
incompatible. Then there are two tuples for which there is a swap 
(The notion of swap is formalized in Definition 14) of the values 
between A and C. Also the two tuples disagree on attribute B ; for 
all i. Otherwise condition number 4 would not be true. As A ~ B 1; 
the values for B^ollow A, so does the rest of attributes B ; because 
of the condition (2). This means the two rows look in the 
following way: 



A 


Bi 


B 2 




B n 


c 

















1 


1 


1 


1 


1 


1 






Figure 3. A order incompatible with C. 

But then B n is order incompatible with C, which we assumed not 
to be the case. We conclude with contradiction. □ 

Theorem 1. (soundness). OD1-OD6 axioms are sound for 
logical implication of ODs. 

Proof. In order to prove the soundness of J we have to prove 
that each of the rules is sound. This is Lemma 2 - Lemma 7. □ 

3.3 Theorems 

We introduce additional inference rules as they will be used 
throughout the paper. 



Theorem 2 
(Union) 

1 X h V 

2 XhZ 



Proof. 

3 YX i- 

4 X ► 
X i- 



YZ 
YX 
YZ 



X h YZ 

Theorem 3. 
(Augmentation) 

1 XhY 



Proof. 

2 XZ h> X 
XZ i-» Y 



[Pref(2)] 
[Suf(l)] 
[Tran(3, 4)] □ 



[RefJ 

[Tran(l,2)] □ 



XZ h> Y 



Theorem 4. 


Proof. 






(Shift) 


1 
J 


vx 


H» W 


[Aug(l)] 


1 W <-> V 


A 


1 717 V 

V vx 


H> VW 


[Pref(3)] 


2 XhY 


C 

J 


1 717 V 

VVA 


<-> vx 


[Norm] 


WX i * VY 


O 


17V 

VX 


H» VW 


[Tran(4,5)] 




7 


17V 

VX 


<-> vwvx 


[Suf(6)] 






o 


171A7V 

V WX 


<-» vwvx 


[Norm] 




Q 

y 


17V 

VX 


<-» vwx 


[Tran(7,8)] 




10 


WX 


i-» V 


[Aug(l) 




11 


WX 


i-> vwx 


[Suf(10)] 




12 


WX 


i-> vx 


[Tran(9,ll)] 




13 


vx 


i-» VY 


[Pref(2)] 






WX 


i-> VY 


[Tra(12,13)]D 



Theorem 5. 
(Decomposition) 

1 X h> ZY 



Proof. 

2 ZY h> Z 
X i-> z 



[Ref] 

[Tran(l,2)] □ 



The following theorem is helpful to prove the Eliminate, Left 
Eliminate and Drop. 

Theorem 6. Proof. 
(Replace) 

1 M <-» N 
XMZ <-> XNZ 



2 


Z 


i-> z 


[RefJ 


3 


MZ 


i-» NZ 


[Shift(l,2)] 


4 


NZ 


i-> MZ 


[Shift(l,2)] 


5 


XMZ 


i-» XNZ 


[Pref(3)] 


6 


XNZ 


' * XMZ 


[Pref(4)] 




XMZ <-> XNZ 


[Tran(5,6)] □ 



Theorem 7. 
(Eliminate) 

1 X t-> Y 



MXNYW 

«-> 
MXNW 



Theorem 8. 
(Left Eliminate) 

1 XhY 
VYXZ <-> VXZ 

Theorem 9. 
(Drop) 

1 X i-> VYZW 

2 X^ V 
X i-» vz 



Proof. 

X 
XX 
X 
XY 
X 

MXYNYW 

1 MXYNYW 
) MXYNW 

MXNYW 

Proof. 

2 X <-> 
VYXZ h> 



o YX 
i-» XYX 
<-> XX 
o XYX 
«-» XY 

o MXNYW 
o MXYNW 
<-> MXNW 
o MXNW 



[Suf(l)] 

[Pref(2)] 

[Norm] 

[Norm 

[Tran(3-5)] 

[Rep(6)] 

[Norm] 

[Rep(6) 

[Tr(7-9)] □ 



YX 
VXZ 



[SuflT)] 
[Rep(l,2)]n 



Proof. 

3 VYZW i-> XYZW [Rep(2)] 

4 Xh+XYZW [Tran(l,3)] 

5 X ^ XY [Dec(4)] 

6 XZW ' * XY [Aug(5)] 

7 XZW^XYXZW [Suf(6)] 

8 XYXZW^ XYZW [Norm] 

9 XZW <-> XYZW [Tran(7,8)] 

10 X i-> XZW [Tran(4,9)] 

1 1 XZW t-» VZW [Rep(2)] 

12 X ^ VZW [Tran(10,ll 
X ^ VZ ~[Dec(12)] □ 
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Theorem 10. 
(Path) 

1 X h YW 

2 Y h VMN 
X h YMW 



Proof. 

3 XhY 

4 X h VMN 

5 X i-> YVMN 

6 YVMN «-» YVMNM 

7 X h YVMNM 

8 X i-> YM 

9 Xh YMYW 

10 YMYW <-> YMW 

X h YMW 



[Dec(l) 

[Tran(2,3)] 

[Union(3,4)] 

[Nonn] 

[Trans(5,6)] 

[Elim(2,7)] 

[Union(l,8)] 

[Norm] 

[Tr(l,10)] □ 



4. COMPLETENESS 

In Section 4.1, we sketch the important elements of the proof for 
completeness of our OD axiomatization. We establish that ODs 
subsume FDs in Section 4.2, followed by the formal completeness 
proof of our axiomatization in Section 4.3. 

4.1 Sketch of the Overall Proof 

Our proof is constructive. To prove the axiomatization is 
complete, it suffices to demonstrate, for any set of ODs M, a 
table t can be constructed that satisfies (Lemma 14), and is 
complete (Lemma 15) with respect to, M using 0, the 
axiomatization. 

Definition 11. (a table t satisfies M) A table t satisfies M iff 
no OD that is derivable over M using (thus, in M + ) is falsified 
by the table t. 

Definition 12. (a table t is complete with respect to M) A 
table t is complete with respect to M iff every OD that is 
constructible over the attributes that appear in M that is not 
derivable over M using J (thus, is not in M + ) is falsified by the 
table t. 

In Section 3.2 in Theorem 1, we proved the soundness of J. Thus, 
any table that satisfies each OD in M satisfies M + , and no table 
that satisfies M can falsify any OD in M + . An OD XhY can be 
falsified in just two ways by a table. (See Theorem 15.) We name 
these two ways split and swap. 

Definition 13. (split) A split with respect to an OD X h XY is 
a pair of tuples t and s in table t, such that t x = s x but (t Y s Y ); 
that is, have the same value for X (t x = s x ) but different values 
for Y (t Y + s Y ). Thus, the split from t falsifies X i-> XY. 
(Consequently, XhY is falsified, too.) This just says that 
set(X) does not functionally determine set(Y). 

Definition 14. (swap) A swap with respect to an OD 
XY <-» YX is a pair of tuples t and s in table t such that t x < s x , 
but s Y < t Y ; i.e., there exist tuples t and s in t such that t x < s x , 
but s Y < t Y ; i.e., t comes before s in any stream satisfying order 
by X, but s comes before t in any stream satisfying order by 
Y. Thus, the swap from t falsifies XY <-» YX. (Consequently, 
X h Y is falsified, too.) 

The table t that we construct for the set of order dependencies M 
will consist of two parts: split(>f) and swap(>f). We shall 
construct these two parts oft - the first half of the table, split(>f ), 
and the second half, swap(>f ) - in such a way that t satisfies M . 
The purpose of split(>f ) will be to falsify every OD of the form 
X i-> XY not in M + . The purpose of swap(>f) will be to falsify 
every OD of the form XhY, XY <-» YX not in M + but for which 



X h XY is in M + . (So XhY not in M + by Theorem 15 appear) 
Thus, t is complete for M . 

Definition 15. (split(3f)) Split(3f) is a table that demonstrates 
for each X h XY which is not in M + that X h XY is falsified by 
split (and so, falsifies XhY, too). 

Definition 16. (swap(>f )) Swap(3C ) is a table that 
demonstrates for each XY <-» YX which is not in M + that 
XY <-» YX is falsified by split (and so, falsifies XhY, too). 

Chain axiom is used to prove following two theorems. 



Theorem 1 1 . 


Proof. 




(Partition) 


4 


X h YX 


[Suf(l)] 


1 XhY 


5 


X H X 


[Refj 


2 X h Z 


6 


X h XY 


[Union(l,5)] 


3 set(Y) = set(Z) 


7 


X h XYX 


[Suf(6) 


Y h Z 


8 


X h XY 


[Norm(7)] 




9 


XY h YX 


[Tran(4,8)j 




10 


XZ h ZX 


[2,4-9] 




11 


X ~ Y 


[(9)] 




12 


X ~ Z 


[(10)] 




13 


XYZ h XYZ 


[Refj 




14 


XYh XZ 


[Elim(l,2,13)] 




15 


Y ~ Z 


[Chain(ll-14)] 




16 


YZ h ZY 


[(15)] 






Y h Z 


[Norm(3,16)] □ 


Theorem 12. 


Proof. 




(Downward Closure) 


2 


ZVXY h Z 


[Refj 


1 XY ~ ZV 


3 


XYZV h Z 


[Tran(l,2)j 


X ~ Z 


4 


XYZV h X 


[Refj 




5 


XYZV h XZ 


[Union(3,4)j 




6 


XYZV h ZX 


[Union(3,4)j 






X ~ Z 


[Part(5,6)] □ 



In the table t that we construct, we shall use integer values for the 
cells. (A cell is a given column entry of a given row.) We 
construct table t by adding splits and swaps. We have to make 
sure that these pieces combined together do not interfere. That is 
why we formalize the notion of append. When we append two 
tables and t 2 , we shall ensure that the resulting table cannot 
introduce any splits (except X h []) or swaps beyond those that 
appear in t 1 and in t 2 alone (Lemma 9). 

Definition 17. (append) Appending two sub-tables ^ and t 2 is 
accomplished by following steps: 

1 . Find the minimum value, x, over all cells of t x . Subtract 
x from all cells in ti. (Now its minimum value is zero.) 
Do the same for t 2 . 

2. Find the maximum value, y, over all cells of t x . Add 
y + 1 to all cells in t 2 .The resulting table of the append 
is the union of t 1 and t 2 as adjusted in steps 1 and 2. 



A 


B 


c 


D 




















1 


1 



A 


B 


C 


D 





1 








1 












Figure 4. 
Table t x . 



Figure 5. 
Table t 2 . 



A 


B 


C 


D 




















1 


1 


2 


3 


2 


2 


3 


2 


2 


2 



Figure 6. 
t a append t 2 . 
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The table t we construct will be split(>f) append swap(.M') 
(which we call split-swap form). We shall construct split(>f ) in a 
way that is analogous to the construction in Ullman's proof of the 
completeness of Armstrong's axiomatization for FDs in [20]. This 
proves our axiomatization for ODs is sound and complete over 
FDs. 

We shall construct swap(>f ) in a way to falsify each OD XhY 
not in M + (but for which X i-» XY is in M + ). This construction 
will be more complex than for split(>f). For each pair of 
attributes A and B from M , we determine whether there needs to 
be a swap between A and B - a pair of tuples s and t such that 
£a < s a> but s B < t B - and, if so, the context in which swap 
between A and B need to occur. 

Definition 18. (constant) An attribute A is called a constant 
with respect to M iff [] h A is in M + . Call an attribute a non- 
constant, otherwise. 

If an attribute is a constant, it means in any table that satisfies M , 
it can have only a single value occurring in the table. 

Definition 19. (context) A set of non-constant attributes X 
with respect to M is a context of a swap t, s iff t x = s x . We say 
swap t, s is in the context of X iff t x — s x . (Note that a context 
for a swap t, s is not unique.) 

By identifying the right contexts for swaps for each pair A and B, 
swap(>f ) will falsify each XhY not in M + (but with X h XY in 
M + ), while not falsifying anything in M + (Lemma 13). This step 
is the cornerstone of our proof for completeness. 

Constructing table swap(>f ) is not straightforward. We are able to 
simplify the construction via structural induction. The hypothesis 
is as follows. 

Hypothesis 1 (hypothesis). For some fixed integer K, for any 
set of ODs M composed over attributes [E lt ... , E K }, there exists a 
table t in split-swap form that satisfies, and is complete with 
respect to, M. 

We prove the base case of this for K < 2 (in Lemma 11). We 
hypothesize this is true for any M with a fixed K number of 
attributes. We then prove that for any M with K + 1 attributes 
that the hypothesis remains true (Theorem 17). Proof of the 
induction hypothesis in essence completes the overall proof. 

Induction provides us with a powerful mechanism within the 
proof. Consider any M with K + 1 attributes. In the first case, if 
any of the attributes are constants with respect to M , we can 
reduce the problem. We effectively project out those constant 
attributes from M. This means we simply remove all occurrences 
of the attributes in the ODs. For example, if we are projecting out 
B and E, ABC h DEF becomes AC h DF. Call the result M'. 
Then, M' is over K or fewer attributes. By the induction 
hypothesis there is a table t' which it satisfies, and is complete 
with respect to, M '. We can show easily how to construct a table t 
from t' which must satisfy, and be complete with respect to, M . 
This is established by Lemma 8. 

Lemma 8. Let t be a table that satisfies, and is complete with 
respect to, M . Let Z be an attribute not in M . Construct table r' 
as r with an extra colum Z, and the same single value for Z in 
each row. Then r' satisfies, and is complete with respect to, 
JtfU{[] h Z}. 

Proof. It is straightforward that r' satisfies M U {[] i-> Z} 
because Z is a constant in r' and Z does not appear in M. 



Clearly, r' falsifies each XhY that does not mention Z that r 
falsifies. For any XhY that mentions Z, it is equivalent to some 
OD that does not mention Z by the Replace rule, which has 
already been established. Thus r' satisfies, and is complete with 
respect to, MU{[]h Z}. □ 

In the second case, we may assume M contains no constant 
attributes. When considering the pair A and B, if we find they 
require a swap in non-empty context X, we can "freeze" the 
attributes of X to a single value. This is true, for any table that 
satisfies M ' = M U {[] h X 1( ... , [] h X n }, where X = 
{X 1; ...,X n }. Now, we have an instance with K or fewer non- 
constants attributes. By our induction hypothesis, there exists a 
table t' in split-swap form that satisfies and is complete with 
respect to M '. Note that M' + 2 M + . Thus, t' does not falsify 
any ODs in M + . We append t' to the table t that we are 
constructing. (Appending these is safe, since M has no constants.) 
Our table swap(>f) therefore is a recursive appending of 
(sub)tables. 

There is the case of attributes A and B such that M dictates they 
must have a swap, but in the empty context {}. This time, we 
cannot use the induction hypothesis to construct the tuples for us 
(t'), that do the job. For this case, however, we can construct two 
tuples directly that introduce a swap for A and B, but that do not 
introduce swaps between any other pair of attributes that would 
falsify any OD in M + . (The soundness of this step is established 
in Lemma 12.) 

For the latter, we must show that, for each XhY not in M + 
such that X h XY is in M + , some sub-table in swap(>f) by our 
construction does falsify it. This is done by proving there always 
is an attribute A in X, an attribute B in Y, and a swap between A 
and B in some context W, which falsifies XhY. (This is part of 
Lemma 15.) That completes the proof. These pieces are formally 
proved in the next two sections. 

4.2 ODS Subsume FDS 

In this section we show completeness of our axiomatization over 
FDs. This result is then used toward showing completeness over 
ODs. 

Theorem 1 3 . (FD and OD correspondence) For every instance 
r of relation R, X -» y iffX h XY, for all lists X that order the 
attributes of X and all lists Y likewise for y. 

PROOF. (IF) If X h XY holds by Lemma 1 X Xy is true. By 
Armstrong axiom, Reflexivity Xy — > y holds. Therefore by 
Armstrong axiom, Transitivity X — » y is true. 

(ONLY IF) If X h XY does not hold, there exists s, t E r, such 
that s x t x but s XY 4 txy- This implies that s x = t x and t Y < 
Therefore Sy + ty and s x = t x and X — > y is not true. □ 



Theorem 14. 
(Permutation) 

1 X h XY 
X' h X'Y' 



Proof. Let Y= [Y 1; Y 2 , ...,Y„], Vfc e [l,n] 

2 XhXY 1 ...Y, c [Dec(l)] 

3 X'X h X'XYi ...Y k [Pref(2)] 

4 X'X h X' [Norm] 

5 X'XYi ...Y k h X% ...Y k [Norm] 

6 X' h X% ...Y k [Tran(3-5)] 

7 X' h X' [Ref] 

8 X' h X'Y k [Drop(6,7)] 
X' h X'Y' [Union(8)] □ 



1227 



An OD X h Y can be falsified in two ways by a table (Theorem 
15). That is why we introduced split and swap (Section 4). 

Theorem 15. (order dependency) XhY holds iffX h XY and 
XY <-> YX. 

Proof. (IF) If X h Y holds then Suffix rule tells us, that 
X <-» YX. X h X follows from Reflexivity, therefore X i-> XY by 
Union and XY <-» YX by Replace, Suffix and Normalization. 

(ONLY IF) Suppose XY <-> YX and X h XY hold. Hence, by 
Transitivity X i-> YX, which by Reflexivity and Transitivity tell us 
that XhY. □ 

Theorem 16. (ODs subsume FDs). Given the set of ODs M, 
OD axioms are sound and complete over functional dependencies. 

Proof. Soundness is by Theorem 1, because of the 
correspondence between FDs and ODs (Theorem 13). The 
remaining step is to prove completeness over FDs, if M 1= 
X -» TJ then M hj X -» y. This is equivalent to say if M 1= 
X h XY, then M H,XhXY for all lists X that order the 
attributes of X and all lists Y likewise for y by Theorem 13 and 
Permutation. 

Firstly, we show that axioms for ODs imply Armstrong's axioms 
for FDs. We can do it because of soundness of axioms. 

FDj Reflexivity: y Q X implies X -> y. 

1 . We are given that y is a subset of X. 

2. Therefore, the normalization rule implies that an order 
dependency X <-» XY holds, for some list X that order 
the attributes of X and some list Y likewise for y. 

3. Hence, Permutation and Theorem 13 implies that 
FD X —* y holds. 

FD 2 Augmentation: X ->y implies ZX -> Zy. 

1. Since we are given X -» y, Theorem 13 tells usX i-> 
XY, for all lists X that order the attributes of X and all 
lists Y likewise for y. 

2. By Reflexivity we can interfere Z «-» Z, for all list Z that 
order the attributes of Z. Hence, by Prefix 
rule ZX h ZXY holds. 

3. By Suffix ZX <-> ZXYZX. ZXYZX may be normalized 
(ZXYZX <-> ZXYZ) 

4. By transitivity ZX i-> ZXYZ. Therefore by Permutation 
and Theorem 13 FD ZX -» Zy holds. 

FD 3 Transitivity: X -» y and y -» Z implies X -> Z. 

1. We are given X -*y, andX->2T, so we may get 
XhXY and Yh YZ for all lists X that order the 
attributes of X and all lists Y likewise for y and some 
list Z likewise for Z by Theorem 13. 

2. It follows by Reflexivity thatX <-» X, so by Prefix rule 
we can infer that XY i-> XYZ. 

3. Since X <-» XY follows by Suffix, Normalization and 
Transitivity, X i-> XYZ follows from Transitivity. 

4. Hence by Decomposition, Permutation and Theorem 13 
FD X -» Z is true. 

However, this proves that axiom system comprising of inference 
rules J is sound and complete for the set of FDs 7 . We would like 
to show it is true for set of ODs M. 

Let M'= {X i-> XY, XY <-> YX | X i-> Y E M }. Based on Theorem 
15 M and M' are equivalent (Definition 9). Also let 
T = {X -*y | X h XY £ M'}. Based on Permutation rule and 
Theorem 13 we know that any relation instance satisfying 
dependencies in T satisfies dependencies in M ' and vice versa. 



Let X + [20], the closure of X (with respect to T) be the set of 
attributes A such that X -» A can be deduced from T by 
Armstrong's axioms. We consider the relational instance r with 
the two rows shown in figure below. 



X + attributes 


Other attributes 



































1 


1 




1 



Figure 7. A relation instance r showing that JVC £ X h XY. 

Based on Ullman's [20] proof of soundness and completes of 
Armstrong's axioms, relation instance r shows that if T is the 
given set of dependencies, and X -» y cannot be proved by 
Armstrong, then r is a relation in which the dependencies of T 
hold but X -* y does not. That is, T does not logically imply X -» 
y. This means the inference rules are sound and complete over T. 
As there is no swaps in r, we do not falsify anything in JVC', 
therefore M , too. This ends the soundness and completeness proof 
for FDs over set of M. □ 

4.3 Completeness of the OD Axiomatization 

As discussed in Section 4 an OD can be falsified by a split or a 
swap. Using this, our proof for completeness is by case. If 
X h XY is not in M + , there will be a split in the sub-table 
split(>f) that we construct that falsifies X h XY, and so that 
falsifies X h Y also. If X h Y is not in M + , but XhXV is, 
there will be a swap in sub-table swap(>f ) that falsifies XhY. 

Lemma 9. There is no split in t x append t 2 that is between rows 
from t t and t 2 , respectively, besides [] h X for any X. There is 
no swap in t x append t 2 that is between rows from t x and t 2 , 
respectively. 

Proof. Let t be a tuple in and sbea tuple in t 2 . Since all 
values in t are less than all values in s, it is impossible for there to 
be a split (except [] i-» X) or swap introduced between t x and 
t 2 within t 1 append t 2 (Definition 17). □ 

We construct table t to satisfy, and to be complete with respect to, 
M. Table t will be split(>f) append swap(>f), as introduced 
above. Note that by Theorem 15 these are the only two scenarios. 

Table split(>T) is constructed by appending two rows to the table, 
as in Figure 7 for each subset of attributes of X from M . 

Lemma 10. (split(>f) satisfies M). For any M with no 
constants, split(M) does not falsify any OD in M. 

Proof. The relational instance split(>f) we have constructed 
contains splits, but no swaps. Therefore XhY could be only 
falsified by split. (Consequently, X h XY is falsified, too.) But we 
know that we are sound and complete over set over FDs by 
Theorem 16 and by Lemma 9 appending of the tables does not 
introduce additional splits (except [] h X) or swaps, therefore this 
is not possible. □ 

Table split(>f ) is based on table we constructed for M in the 
proof of Theorem 16, which establishes that ODs subsume FDs; 
that is, split(>T) satisfies M and it is complete with respect to the 
OD of the form XhXY- which are equivalent to FD statement 
(Theorem 13) - in that it falsifies each XhXV not in M + but 
which is composable over the attributes in M . As constructed, 
split(>f ) introduces no swaps. 

For swap(>f ) a natural approach would seem to be to construct 
the table incrementally, to falsify each OD not in M + , in turn, 
while ensuring we do not also falsify any OD in M + , in each 
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step. This would be similar to how we constructed split(JVf). 
However, how to do this by a straightforward construction is not 
apparent. When considering how to falsify X i-> Y, which 
attributes from X and from Y, respectively, should have a swap 
appear in the table? And how do we ensure that this swap does not 
falsify any OD in M + l Instead, we consider every pair of 
attributes, A and B, from the set of attributes in M. We 
determine the relevant contexts, if any, in which a swap with 
respect to A and B must occur in swap(>f ). 

The set(XY) is a context for A, B with respect to M iffXA~Y 
and X~YB are in M + , but XA-YB is not in M + . If there 
exists such a context for A, B, this indicates there should be a 
swap between A and B (to falsify XA ~ YB). It also indicates the 
"context" of the swap, as the swap must not falsify XA ~ Y or 
X ~ YB. One could imagine constructing a swap - a pair of rows t 
and s for this - by having txy = Sxy- That way, the swap t, s 
would not falsify XA ~ Y or X ~ YB. But what should the values 
of t and s be outside of XY? We cannot construct t and s simply, 
ensuring the swap s, t does not falsify anything in M + . Instead, 
we use structural induction. Consider for now that XY is non- 
empty. If we added [] >-> XY to M - call the result M ' - XY can 
only have a single value in any table that satisfies M ' . Recall the 
hypothesis from Hypothesis lin Section 4. We adopt this as our 
induction hypothesis. Assume our present M contains K+l 
attributes. Then M' contains Kox fewer attributes since [] >-> XY. 
By our induction hypothesis, there is a table t' (see Figure 8) that 
satisfies, and is complete with respect to M '. As XA ~ YB is not 
in M + , it is not in M' + either. Thus t' falsifies XA ~ YB. 
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a i,i 


a i,2 




a J,i 



Figure 8. A relation instance for K+l non-constant attributes. 

Which context for A, B should we do this for? Not for all of them. 
It is the maximal contexts that are relevant. X, Y is a maximal 
context for A, B iff it is a context for A, B and there is no other 
context X', Y' such that set(X'Y') => set(XY). 

Since we use induction in the proof, we need to prove a base case 
of the induction hypothesis. We prove it for the cases of M with 
0, 1, and 2 non-constant attributes in the following Lemma. 

Lemma 1 1 . (Induction base, K < 2). For at most K < 2 
attributes there exists a table t in split-swap form that satisfies 
and is complete with respect to M. 

Proof This can be directly shown by enumerating through all 
the possibilities. □ 

We have assumed so far that the (maximal) contexts, if any, for A, 
B are non-empty. There is the case where A, B has a single 
maximal context {}, the empty context. In this case, we cannot 
appeal to the induction hypothesis. Fortunately, such pair A, B 
will have special properties by virtue of the fact they have 
swapped orders only in the empty context. In fact, our sixth axiom 
schema speaks directly to this very case. (We likely would never 
have had the insight for the sixth axiom (schema) Chain had we 
not encountered this case while attempting to prove 
completeness.) In this case, we will be able to construct a two-row 
swap for A, B directly that does not falsify anything in M + . 

Lemma 12. (Empty context). There exists a swap for A, B with 
the empty maximal context that satisfies M while falsifying A ~ B. 



Proof. We construct a two-row swap with values and 1 that 
falsifies A ~ B but cannot falsify anything in.M" + as shown in 
Figure 9. For the latter, it suffices to prove that the swap does not 
falsify any C ~ D in M + . For A and B, they have opposite values 
in each row in the swap. For any C such that A ~ C is in M + , C 
must have the same value as A in each row. (Otherwise, A and C 
would have swapped values - and 1 - between the two rows.) 
Likewise for B. And for any D such that C ~ D is in M + , D must 
have the same value as C (and so the same as A) in each row. And 
so forth. Of course, it would be impossible to construct our two 
rows if there is a chain connecting A and B through order- 
compatibility: A ~ E x ~ ... ~ E n ~ B. If there were, we would need 
to set the value of each E x ~ ... ~ E n the same as A's value and the 
same as B's value in each row. But A's and B's values differ. The 
Chain axiom schema (OD6) ensures there is no such chain from A 
to B. EjA ~ E ; B is in>f + , for each E ; , since the maximal context 
for A, B is []. If there were a chain A ~ E x ~ ... ~ E n ~ B such that 
A ~ E ± is in M + , E ; ~ E i+1 is inM + for each on 1, . .,n — 1, and 
E n ~ B is in M + , then A ~ B is in M + also, by the Chain axiom. 
Since we know that A ~ B is not in M + , there is no such Chain. 
Thus, our two rows are constructable. We can partition the 
attributes into three groups: those that must have the same values 
as A , those the same as B, and those for which it does no matter. 
Figure 9 shows the construction. 
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Figure 9. Swap for A, B with the empty maximal context. 



For attributes that do not match A or B, it is important we do not 
introduce swaps between them, as this could falsify something in 
M + . It suffices to use the same value for these in each row. Call 
the two-row swap in Figure 9 r. Table r satisfies M . Assume 
otherwise: for Xi-»Y e M, r falsifies it. Let X i-> Y be over non- 
constants attributes, without loss of generality. Let E be the first 
element of X, and F of Y. If both E and F are from A, A's group 
or the remaining group attributes (as in Figure 9), or they are both 
from B or B's group attributes, then X and Y order the two tuples 
of r the same way. Therefore, E must be from one group, and F 
from the other. Since >->Y e M + , X~ Y e M + by Theorem 15. 
By the Downward Closure rule E ~ F e M + . Contradiction. □ 

Our proof obligation for swap(>f ), that it does not falsify any OD 
in M + is proved in the following Lemma. 

Lemma 13. (swap(M) satisfies M). Assuming Hypothesis 1, 
for all M of K or fewer non-constants attributes, swap(M) does 
not falsify any OD in M. 

Proof. Hypothesis 1 is the key in proving that A, B do not 
falsify any OD in M + . When we consider pair A and B which 
requires a swap in non-empty context X we obtain M ' = M U 

{[] h> X x [] h> X n }, where X = {X^ ...,X n }. By our 

hypothesis, there exists a table t' in split-swap form that is 
satisfied and complete with respect to M '. As M' + 2 M + , 
therefore any ODs in M + is not falsified. 

None of the sub-tables falsifies any OD in M + , by the hypothesis 
in non-empty context and soundness of base cases (empty context 
and K < 2). As the table swap(>f ) is append-normalized, 
swap(>f) does not falsify any OD in M + . □ 

Lemma 14. (Satisfies). Every OD that is derivable with respect 
to the axiomatization over M is not falsified by the table t. 
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Proof. The sub-tables split(>f) and swap(Jtf), as we construct 
them, are satisfied with respect to M (Lemma 10 and Lemma 13 
respectively). If neither split(>f ) nor swap(JVf) falsifies any OD 
in>f + , then t as split(>f) append swap(>f) cannot falsify any 
OD in M + either (See Lemma 9). □ 

Lemma 15. (complete). Assuming Hypothesis 1 for all M 
constructed over K or fewer attributes, given any M constructed 
over K+l attributes and none is a constant with respect to M 
(Definition IS), the table t = split(M) append swap(M) is 
complete with respect to M. 

Proof. Assume XhY over only non-constant attributes, is in 
the complement of M + (X i-> Y g M + ). Theorem 15 tells us that 
order dependency X >-> Y holds iff X ^ XY and XY ^ YX. 

Case 1. 

X h> Y g M + . We have already proven that for the scenario with 
X >-> XY (FD) we are always complete (Theorem 16). 

Case 2. 

X h> Y g M + , but X h> XY 6 M + . By Theorem 15 X ~ Y g M + . 
Find longest PA prefixing X such that: 

1. P~YeJlf + 

2. PA~YgJlf + 

Find longest QA prefixing Y such that: 

3. PA~QeJVf+ 

4. PA~QBgJtf + 

5 . P ~ Q e M + [Downward Closure ( 1 )] 

6. P ~ QB e >f + [Downward Closure ( 1 )] 

7. PAQB <-> QPAB e M + [Shift(3, [B B])] 

8. PAQB <-> PQAB e M + [Replace(5)] 

9. QBPA <-> PQBA e M + [Shift(6, [A A]] 

10. PAQB ^ QBPA g Jlf + [(4)] 

11. PQAB ^ PQBA g M + [Transitivity(8,9,10] 

12. PQA ~ PQB g >f + [11] 

A and B have a swap within the context, W = set(PQ). In 
constructing swap(>f), we considered all maximal contexts for 
A, B for which a swap is needed. Hence, we considered some 
superset V 2 W. If V =t= [], a sub-table that satisfies, and is 
complete with respect to M U {[] h>V 1( ...,[] >-> V n }, where 
V = {V 1 , ... ,V n ] is appended in swap(>f). This falsifies WA~ 
WB, for all lists W that order the attributes of W (thus, falsifies 
X h> Y). Else if V = [], we appended a swap s, t as in Figure 9 
which falsifies A ~ B ([]A ~ []B). □ 

Theorem 17. (soundness and completeness). The set of the OD 
axioms J ={ODl-OD6} is sound and complete. 

Proof. 

Base case: M with K < 2 attributes proved by Lemma 1 1 . 
Assume Hypothesis 1 for all M composed over K or fewer 
attributes. 

Induction step: Consider an M over K + l attributes. 
Case 1. 

M contains constants attributes (Definition 18). Let M' be M 
with these constants attributes removed. M' has K or fewer 
attributes. By the induction hypothesis (Hypothesis 1), there is r' 
which satisfies, and is complete with respect to, M '. Lemma 8 
guarantee we can construct r from r' that satisfies, and is complete 
with respect to, M . 



Case 2. 

M contains no constants attributes. Lemma 15 establishes there 
exists an r that satisfies, an is complete with respect to, M .□ 

5. RELATED WORK 

Ordered sets and lattices have been a subject of research in 
mathematics [5]. In fact, our concept of OD is equivalent to 
order-preserving mapping between two ordered sets. The work in 
mathematics has concentrated on investigating properties of, and 
relationships between, ordered sets rather than among the 
mappings. To the best of our knowledge, no inference system for 
describing relationships between mappings has been proposed. 

Order dependencies were introduced for the first time in the 
context of database systems in [7]. However, the type of orders, 
hence the dependencies defined over them, were different from 
the ones we presented here. A dependency X ^ y holds if order 
over the values of each attribute in X implies an order over the 
values of each attribute of TJ. (For simplicity, we use the arrow 
for different type of orders.) In other words, the dependency is 
defined over the sets of attributes rather than lists. The distinction 
between these two types of dependencies was later [13] aptly 
described as pointwise versus lexicographical order dependency. 
Formally, an instance satisfies a pointwise order dependency 
X TJ if, for all tuples s and t, for every attribute A in X, s A op 
t A implies that for every attribute B in y, s B op s B , where op 
E {<, =, >, <, >}. In [8] a sound and complete set of axioms for 
such dependencies is defined together with an analysis of the 
complexity of determining logical implication. An application of 
the dependencies for an improved index design is presented in [6]. 

Dependencies defined over lexicographically ordered domains 
were introduced in [13] under the name lexicographically ordered 
functional dependencies. Two other papers [14], [15] by the same 
author develop a theory behind both lexicographical as well as 
pointwise dependencies (the latter were somewhat simpler than 
the dependencies defined in [7]). A set of axioms (proved to be 
sound and complete) is introduced for pointwise dependencies, 
but - interestingly - not for lexicographical dependencies. Only a 
chase procedure is defined for the latter. An extension of 
relational algebra to ordered domains is presented in [15]. 

Sorting is at the heart of many database operations: sort-merge 
join, index generation, duplicate elimination, ordering the output 
through the SQL order-by operator, etc. The importance of 
sorted sets for query optimization and processing has been 
recognized very early on. Right from the start, the query optimizer 
of System R [16] paid particular attention to interesting orders by 
keeping track of all such ordered sets throughout the process of 
query optimization. In more recent research, [8] and [10] explored 
the use of sorted sets for executing nested queries. The importance 
of sorted sets has prompted the researchers to look beyond the sets 
that have been explicitly generated. Thus, [12] shows how to 
discover sorted sets created as generated columns via algebraic 
expressions. (In DB2, a generated column is a column that can be 
computed from other columns in the schema.) 

For example, if column A is sorted, so is the generated column G 
defined as G = A/100 + A — 3 (that is, A G). We show in 
[18] how to use relationships between sorted attributes discovered 
by reasoning over the physical schema. The axiomatization 
presented here provides a formal way of reasoning (discovering) 
previously unknown (or hidden) sorted sets. Based on this work, 
many other optimization techniques can also be adapted. 
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6. CONCLUSIONS AND FUTURE WORK 

Ordering permeates databases, to such an extent that we take it for 
granted. It appears in many queries and is relatively expensive to 
perform. The goal of this paper was to develop a theory behind 
dependencies over lexicographically ordered sets. To the best of 
our knowledge, this is the first attempt at an axiomatization for 
such dependencies. We present that ODs subsumes FDs. We have 
also shown our inference rules for ODs are sound and complete. 

Though now we conclude, the story of order dependencies is far 
from over. There is much more that can be done, and should be. 
Future work in this area should pursue two lines of research: on 
the one hand, further investigation of the theoretical questions; on 
the other hand, applications of the theoretical framework in a 
practical database setting. These are further things we plan to do 
in future work. 

• One of the major practical applications which we are currently 
working on is a theorem prover [20]. Given a set of order 
dependencies M and an arbitrary dependency X h Y, we 
would like to efficiently decide whether M logically 
implies X i-» Y. Such a theorem prover would be a useful tool 
in query optimization. 

• Integrity constraints have been widely used in query 
optimization through query rewrites. For example, functional 
dependencies have been shown to be useful in simplifying 
queries with distinct, order by, and group by 
operations [17], whereas inclusion dependencies can be used 
to remove certain joins over primary and foreign keys [4]. We 
believe that ODs can be used in similar ways to simplify 
queries with order by operation. 

• We are exploring the use of ODs for database design [2]. FDs 
are by far the most common integrity constraints in the real 
world. The notion of the key derived from a given set of FDs 
is a fundamental to the relational model. The determination of 
ODs might be an important part of designing databases in the 
relational model, too. It can be used in database normalization 
and denormalization. Order dependencies can reveal 
redundancies that cannot be detected using functional 
dependencies alone. It would be an interesting research topic 
to extend the results obtained there to the design of relational 
databases. 
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